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Abstract 

It has often been speculated that bacterial protein-tyrosine kinases (BY-kinases) evolve rapidly and maintain relaxed substrate 
specificity to quickly adopt new substrates when evolutionary pressure in that direction arises. Here, we report a phylogenomic 
and biochemical analysis of BY-kinases, and their relationship to substrates aimed to validate this hypothesis. Our results suggest that 
BY-kinases are ubiquitously distributed in bacterial phyla and underwent a complex evolutionary history, affected considerably by 
gene duplications and horizontal gene transfer events. This is consistent with the fact that the BY-kinase sequences represent a high 
level of substitution saturation and have a higher evolutionary rate compared with other bacterial genes. On the basis of similarity 
networks, we could classify BY kinases into three main groups with 14 subgroups. Extensive sequence conservation was observed 
only around the three canonical Walker motifs, whereas unique signatures proposed the functional speciation and diversification 
within some subgroups. The relationship between BY-kinases and their substrates was analyzed using a ubiquitous substrate (Ugd) 
and some Firmicute-specific substrates (YvyG and YjoA) from Bacillus subtilis. No evidence of coevolution between kinases and 
substrates at the sequence level was found. Seven BY-kinases, including well-characterized and previously uncharacterized ones, 
were used for experimental studies. Most of the tested kinases were able to phosphorylate substrates from B. subtilis (Ugd, YvyG, and 
YjoA), despite originating from very distant bacteria. Our results are consistent with the hypothesis that BY-kinases have evolved 
relaxed substrate specificity and are probably maintained as rapidly evolving platforms for adopting new substrates. 

Key words: phylogeny, bacterial protein kinases, kinase evolution, kinase classification, BY-kinases, kinase-substrate 
coevolution. 



Introduction 

Protein phosphorylation is a widespread posttranslational 
modification and plays a key role in regulation of cellular func- 
tions. Enzymes that perform protein phosphorylation, termed 
protein kinases, transfers phosphate groups from ATP to re- 
active side chains of amino acids in proteins. This usually 
changes the enzyme activity, cellular localization, or interac- 
tion with partners of the target protein (Hanks and Hunter 
1995). Tyrosine is one of phosphorylatable amino acids. In 
Eukarya, tyrosine phosphorylation is carried out by Hanks- 
type kinases (Hanks et al. 1988). In Bacteria, the presence of 



tyrosine-kinases passed undetected until mid-1990s 
(Grangeasse et al. 1997). Since then, a number of BY-kinases 
have been identified, sharing a structure quite distinct from 
Hanks-type kinases. These tyrosine kinases have been unified 
in a new, bacteria-specific class of enzymes, named 
BY-kinases (Grangeasse et al. 1997). A prototype BY-kinase 
contains an extracellular loop and a cytosolic domain 
(Grangeasse et al. 2007, 2012). These two domains can 
appear linked into one large protein encoded by a single 
gene (e.g., in Escherichia coli) or exist as two proteins: one 
transmembrane and another cytosolic protein, encoded by 
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two adjacent genes (e.g., in Bacillus subtilis). The cytosolic 
domain, defined as catalytic domain (CD), contains the cata- 
lytic site and performs the phosphorylation on tyrosine. The 
intracellular juxtamembrane region of the transmembrane 
domain (following the second transmembrane helix) is essen- 
tial for the activation of the CD. Thus, this domain was defined 
as the transmembrane activator domain (TAD) (Jadeau et al. 
2008). The CD possesses Walker A, A' motifs in the N termi- 
nus, followed by Walker B motif in the center and a tyrosine- 
rich cluster in the C terminus. The three Walker motifs consti- 
tute an active site required for ATP-Mg binding (the P-loop) 
(Walker et al. 1982; Doublet et al. 1999; Soulat et al. 2007; 
Grangeasse et al. 2012), which is significantly different from 
that used by Hanks-type kinases but usually found in ATP/ 
GTPases (Leipe et al. 2002; Grangeasse et al. 2007, 2012). 
A nucleotide-binding motif similar to that of BY-kinases has 
been found in arsenite ATPases (ArsA) and MinD proteins, 
which lead to the hypothesis that they have all evolved from 
the same ancestral bacterial ATPase (Grangeasse et al. 2012). 
The tyrosines in the C-terminal cluster represent the autopho- 
sphorylation sites of BY-kinases. The sequence of the C-termi- 
nal cluster is not conserved among BY-kinases, it can vary 
considerably with respect to overall length, number, and po- 
sition of tyrosines (Grangeasse et al. 201 2). It has been shown 
that no single tyrosine in this region was essential for autopho- 
sphorylation (Paiment et al. 2002). 

Most of the experimentally validated BY-kinases are 
encoded by genes located in large operons, which are in- 
volved in biosynthesis and export of capsular/extracellular 
polysaccharides (Whitfield 2006). In E. coli, the first evidenced 
BY-kinase, Wzc, is encoded by a gene of the cps (or wca) 
operon that participates in biosynthesis of exopolysaccharides 
(Vincent et al. 1999). Loss of autophosphorylation of Wzc 
abolished capsule assembly, inciting the authors to conclude 
that its autokinase activity is essential for this process 
(Wugeditsch et al. 2001). BY-kinases were often found to 
affect virulence or resistance to cationic antimicrobial pep- 
tides, which are both associated with capsular polysaccharide 
synthesis (Stingele et al. 1996; Morona et al. 2004; Morona 
et al. 2006; Minic et al. 2007). More recently, it was 
understood that BY-kinases also act via substrate phosphory- 
lation. The first described substrate for BY-kinases was the 
UDP-glucose dehydrogenase (Ugd, Grangeasse et al. 2003). 
A recent study indicated that Ugd phosphorylation, mediated 
by BY-kinases Wzc or Etk, respectively, represents the control 
element for extracellular polysaccharides production and re- 
sistance to polymyxin of E. coli (Lacour et al. 2008). BY-kinases 
are not only related to polysaccharide biosynthesis, they are 
also involved in other process such as lysogenization, heat 
shock response, DNA replication, cell cycle, and others (Klein 
et al. 2003; Lacour et al. 2006, 2008; Petranovic et al. 2007; 
Kolot et al. 2008). In £ coli, Etk affects heat shock response 
through phosphorylation of a heat shock sigma factor RpoH 
(Klein et al. 2003). In B. subtilis, BY-kinase PtkA phosphorylates 



the single-stranded DNA-binding protein SsbA and influences 
DNA replication and the cell cycle (Petranovic et al. 2007). 
Other substrates of PtkA (such as single-stranded DNA exo- 
nuclease YorK, aspartate semialdehyde dehydrogenase Asd, 
and transcription factor FatR) have been described since de- 
fining it clearly as a promiscuous kinase (Jers et al. 2010; 
Derouiche et al. 2013). When considering the fact that BY- 
kinases accomplish different tasks by phosphorylating differ- 
ent substrates, one cannot help but wonder how one kinase 
recognizes different substrates with totally different sequence 
and structure, and how does the new kinase-substrate couple 
emerge in terms of evolution. 

To better understand the evolution of BY-kinases and 
the relationship between kinases and their substrates, we per- 
formed a phylogenomic analysis of the BY-kinases that led us 
to conclude that BY-kinases have a complex evolutionary his- 
tory, mainly driven by horizontal gene transfer (HGT) and du- 
plications, with fast evolution due to higher synonymous 
substitution rate. Although coevolution was observed in 
some kinase-substrate pairs (Gildor et al. 2005; Skerker 
et al. 2008), no evidence of coevolution at the sequence 
level was detected between BY-kinases and their substrates, 
in particular the Ugd family proteins. Nevertheless, our results 
suggest that BY-kinases have the capability to phosphorylate 
the same set of substrates across very distant bacterial phyla. 

Materials and Methods 

Identification of BY-Kinases and Ugd Family Proteins 

The complete sequences of 1,471 bacterial and 117 archaeal 
genomes available in March 201 2 were downloaded from the 
National Center for Biotechnology Information FTP website 
(ftp.ncbi.nih.gov, last accessed April 1 , 2014). The use of com- 
plete genomes is essential to phylogenomic approaches be- 
cause it allows determining the exact distribution of 
homologs. In addition, we compared the taxonomic distribu- 
tion with those built on sequences from the nr database (data 
not shown). No important differences were evident between 
the two analyses, meaning that the use of complete genomes 
does not bias the interpretation of results. To identify BY- 
kinase homologs from these genomes, the annotated BY- 
kinase sequences were first downloaded from the dedicated 
BY-kinase database (Jadeau et al. 2012), and these are re- 
ferred to as the BYKdb data set. Because this data set was 
constructed from UniProt database and was redundant, a 
nonredundant BY-kinase data set was generated using the 
CD-hit program (Li and Godzik 2006) with cutoff of 0.7. The 
resulting sequences were aligned with ClustalW (Larkin et al. 
2007), and the HMM profile corresponding to CD region was 
generated using the HMMER package (Eddy 2011). The 
HMMER package (Eddy 2011) and self-written scripts were 
then used to search for BY-kinase homologs in the complete 
bacterial and archaeal genomes, requiring the presence of the 
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CD. Alignments with E value less than 0.01 were first consid- 
ered as significant, and the resulting sequences were then 
filtered with the isBYK algorithm (Jadeau et al. 2012) to iden- 
tify the BY-kinase homologs. The corresponding sequences 
were subsequently searched against the Pfam 25.0 database 
(Finn et al. 2010) to determine the presence of additional 
known functional domains. For each BY-kinase homolog, 
the gene context, defined as the 10 neighboring genes (lo- 
cated upstream and downstream the BY-kinase, -5 to +5), 
was investigated with cluster of orthologous group (COG) 
(Tatusov et al. 2000) classification using self-written scripts. 
The identification of the Ugd family protein homologs in com- 
plete bacterial and archaeal genomes was performed using 
the HMMER package (Eddy 201 1) requiring the presence of 
three functional domains in the following order (from N to C- 
terminal): UDPG_MGDP_dh_N (PF03721), UDPG_MGDP_dh 
(PF00984), and UDPG_MGDP_dh_C (PF03720). Alignment 
scores higher than the cut_tc threshold were considered 
significant. 

Phylogenetic Analysis of BY-Kinase and Ugd Family 
Proteins 

The CDs from the retrieved BY-kinase homologous sequences 
were first aligned using MAFFT v6.860b (Katoh and Toh 2008) 
with the iterative global alignment option. The resulting align- 
ment was then filtered to remove the positions that con- 
tained more than 30% of gaps. A phylogenetic tree of all 
BY-kinases was constructed using the FastTree version 2.1 
(Price et al. 2010) with MinD proteins (AAC74259, 
CAB 14759) as outgroups, and 100 resampled trees were 
generated to calculate the local bootstrap values. An in- 
depth phylogenetic analysis using more restricted sequence 
sampling representative (54 sequences, 187 positions, 
named BYKsel) of the diversity of BY-kinases was performed 
using the Bayesian approach implemented in MrBayes 3.2 
program (Ronquist and Huelsenbeck 2003). For Bayesian anal- 
ysis, a mixed substitution model with gamma law (four rate 
categories) and a proportion of invariant sites were used. 
The Markov chain Monte Carlo search was run with four 
chains for 1,000,000 generations. Trees were sampled every 
100 generations with the first 1,000 trees were discarded as 
"burnin." 

For Ugd phylogenetic trees, organisms possessing both 
Ugd and BY-kinase were selected. Two restricted samples 
were defined: one (72 sequences, 220 positions) containing 
Ugd family proteins from organisms in the BYKsel data set and 
another one (105 sequences, 200 positions) contains Ugd 
family proteins surrounded by BY-kinase genes (maximum 
five upstream or five downstream). Each sample was aligned 
using ClustalW program (Larkin et al. 2007), followed by the 
selection of unambiguous parts with Gblocks 0.91b program 
(Castresana 2000). The Bayesian phylogenetic trees were then 
performed as described above for BY-kinases. 



Substitution Saturation and Evolutionary Rate Estimation 

The phylogenetic tree using the maximum likelihood (ML) 
method through the PHYML program (Guindon et al. 2010) 
was performed with the BYKsel data set. The use of ProtTest 
program (Abascal et al. 2005) allows to determine the LG 
model with estimated gamma-distribution parameter (G) 
and the proportion of invariant sites (/) as the best-fit amino 
acid substitution model. PHYML with the "LG+I+G" model 
was then used for ML reconstruction with a 100-replicate 
bootstrap. The amino acid substitution saturation was evalu- 
ated by comparing the number of substitutions inferred by ML 
method with the number of observed differences. The in- 
ferred substitutions were calculated as the sum of all branch 
lengths between two sequences (Leclere and Rentzsch 201 2). 
The DNA sequences were aligned according to corresponding 
amino acid alignments. The substitution saturation at the DNA 
level was assessed by DAMBE according to Xia's method (Xia 
and Lemey 2009). PAML's codeml (Yang 2007) was used to 
estimate nonsynonymous rate (d/V), synonymous rate (d5), 
and the ratio of these rates (d/V/d5). Gapped regions were 
excluded to avoid spurious rate inference. The phylogenetic 
trees inferred with FastTree (Price et al. 2010) were used as a 
constraint tree. For codon sequences from the BYKsel data set 
and other conserved genes in bacterial lineages (frr, infC, 
nusA, pyrG, rplB, rpID, rpIL, rplM, rpIN, rpIP, rpIS, rpIT, 
rpnriA, rpoB, rpsB, rpsE, rpsJ, rpsM, rpsS, smpB, and tsf), only 
one model of protein evolution was used: model 0 allows a 
single co (d/V/d5) value throughout the genealogy. To deter- 
mine whether the BY-kinases were subjected to selection, 
codon alignment from the BYKsel data set was tested. The 
presence of positive selection was detected as described by 
Yang (1998). Likelihood ratio test was used to test the model 
by comparing model M0 (a single ratio) with M3 (discrete), 
M1a (nearly neutral) with M2a (positive selection), M7 (p dis- 
tribution), and M8 (p and co). Pvalue was calculated using the 
chi2 program in PAML. Substitution saturation impacts the 
estimation of 6N/6S (Gharib and Robinson-Rechavi 2013). 
To avoid the influence of substitution saturation, the codon 
sequences of BY-kinases from three closely related species 
(Streptococcus pneumoniae TIGR4, Str. suis SC84, and 
Eubacterium eligens ATCC27750) with low level of substitu- 
tion saturation were tested for the positive selection using the 
same method. 

Reconstruction of Protein Similarity Network 

The protein similarity networks (PSNs) were constructed as 
described (Zhang et al. 2011). Briefly, pairwise alignments 
(entire BY-kinase sequences or CDs) were performed with 
the bl2seq program from the BLAST package (Altschul et al. 
1 997). Then a series of E-value thresholds were applied for the 
selection of sequence pairs with significant similarity. Next, the 
distribution of pairwise alignments E-value was used to define 
the optimal E-value cutoff. Finally, significant alignments (with 
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E-value less than the optimal E-value) were used to construct 
the PSN. Each node in the PSN indicates a BY-kinase, and the 
edge indicates that two nodes share significant similarity with 
an E-value less than the selected cutoff. The network was 
visualized using Cytoscape (Shannon et al. 2003) with the 
yFiles organic layout. 

Consensus Sites in BY-Kinases 

The consensus sites were determined based on the multiple 
alignments with "50-10" rule(Carretero-Pauletetal. 2010). A 
strong conserved site was designed if an amino acid at that 
site was present in all (1 00%) sequences. The sites with amino 
acid present in more than 50% of the sequences were desig- 
nated as a weakly conserved site. For such a weak conserved 
site, the amino acid was also added into the 50-1 0 consensus 
sequences if it existed in more than 1 0% of all the sequences. 
Sequence logos were generated with Weblogo program 
(Crooks et al. 2004). For each PSN subgroup, a significant 
conserved motif was defined as an amino acid region with 
at least two strongly conserved sites spanning a window of 1 0 
amino acids. 

Vector Construction and Growth Conditions 

Genomic DNA was extracted by DNeasy Blood & Tissue Kit 
(Qiagen). PCR was performed with specific primers (supple- 
mentary table S1, Supplementary Material online) and the 
corresponding genomic DNA. All PCR products were inserted 
into pQE-30 Xa (Qiagen) to get the 6xHis-tag fusion proteins. 
Escherichia coli NM522 was used for cloning, and E. coli M1 5 
(pREP4::GroEIVGroES) strain was used for protein synthesis. 
Cells were routinely grown in LB medium, with addition of 
ampicillin (100|ig/ml) and kanamycin (25uxj/ml) when 
needed; 1 mM IPTG was used to induce expression of 
cloned genes. 

Protein Purification and Phosphorylation Analysis 

Protein synthesis and purification were carried out as de- 
scribed previously (Mijakovic et al. 2003). Induction was 
started at OD 600 = 0.6, cells were harvested 3h later, and 
sonicated. 6xHis-tagged proteins were purified by Ni-NTA af- 
finity chromatography (Qiagen) from crude extracts and 
desalted by PD-10 columns (GE Healthcare). For in vitro phos- 
phorylation analysis, reactions were performed in solution 
containing 50 mM Tris pH 7.5, 100mM NaCI, 5mM MgCI 2 , 
5% glycerol, and 50 uM ATP with 20uCi/mmol [y- 32 P]-ATP, 
incubated at 37 °C for 1 h, then separated by 12% SDS- 
polyacrylamide gels. Concentration of BY-kinases, substrates, 
and antimonite in all assays is indicated in figures of this article. 
The radioactive signals were revealed by autoradiography, 
using the FUJI phosphoimager as described previously 
(Mijakovic et al. 2003). 



Homology-Based Modeling of 3D-Structures 

For six sequences of BY kinases that were used for experimen- 
tal studies, a template sequence with a known structure was 
searched using FASTA algorithm against the PDB database on 
the NPS@ web server (Combet et al. 2000). The molecular 
models were computed using the Geno3D tools (Combet 
et al. 2002), following a modeling process under restraints 
similar to the process used for experimental solving of protein 
structure by NMR. The Staphylococcus aureus sequence was 
not modeled as its structure was experimentally solved (PDB 
code 3BFV [Olivares-lllana et al. 2008]). Molecular models and 
the experimentally solved structure were analyzed and com- 
pared with the DeepView (Guex and Peitsch 1997) and the 
Matt software (Menke et al. 2008). 

Results 

Distribution of BY-Kinases in Bacteria 

To explore the general occurrence of BY-kinases in bacteria, a 
large-scale in silico analysis of BY-kinase genes was performed 
on the available complete bacterial and archaeal genomes. 
Experimentally characterized BY-kinases contain a CD with 
or without a TAD (referred as PF02706 in Pfam database). 
However, the structural core of BY-kinases is defined by 
three Walker-like motifs (A, A', and B) and a C-terminal tyro- 
sine cluster (R1), which are all parts of the CD. Furthermore, 
the TADs do not have particularly conserved distinguishing 
features besides the transmembrane helices. Thus, in this 
study, we first defined an HMM profile from the CD. By 
using the CD profile as a query, we detected 796 BY-kinases 
homologs (fig. 1 and supplementary table S2, Supplementary 
Material online) and as their name indicated, all of them were 
found in bacterial genomes but not in Archaea. These 796 BY- 
kinas homologs are present in 577 of 1,471 (39.2%) bacterial 
genomes. The average of BY-kinases per genome is about 1 .5 
(796/577), and 72.5% of the organisms contain only one copy 
per genome. However, there can be up to eight copies in 
Burkholderia strains (supplementary table S2, Supplementary 
Material online), which indicated the possibility of gene ex- 
pansion and duplication. BY-kinases are found in members 
of all bacteria phyla except Chlamydia, Aquificales, and 
Epsilonproteobacteria (fig. 1). BY-kinase abundance is high 
in four phyla: Gammaproteobacteria, Firmicutes, 
Betaproteobacteria, and Alphaproteobacteria, whereas the 
numbers of organisms harboring BY-kinases are fairly low in 
some phyla (e.g., 37 out of 1 68 in Actinobacteria; 2 out of 36 
in Spirochaetes). In the four phyla with high BY-kinase abun- 
dance, BY-kinases are found in most of the orders (e.g., six out 
of nine orders in Firmicutes and 10 out of 15 orders in 
Gammaproteobacteria), whereas in other phyla, the presence 
of BY-kinases is limited (e.g., three out of seven orders in 
Cyanobacteria) (supplementary table S2, Supplementary 
Material online). Extensive variation in distribution of 
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Fig. 1. — The distribution of BY-kinases over the main bacterial phyla. For each of the bacteria main phylum, columns represent: Org, the number of 
organisms that harbor -kinases (total number of studied organisms); Genus, number of genus with organisms possessing BY-kinases (total number of genus 
in the phylum); BY-kinase, number of BY-kinases in the phylum; CD and TAD-CD, number of CD and TAD-CD-type BY-kinases, respectively; Chr and 
plasmid, number of BY-kinases located within the chromosome and plasmid, respectively. For CD, TAD-CD, Chr, and plasmid columns, the numbers in 
brackets correspond to the number of organisms. 



BY-kinases was also observed at the genus level, because 
less than 50% of genera in each phylum harbor BY-kinases, 
except in Acidobacteria, Betaproteobacteria, and Plancto- 
mycetales (supplementary table S2, Supplementary Material 
online). The heterogeneous distribution of BY-kinases is con- 
sistent with the experimental evidence that their genes are 
nonessential in bacteria and are often involved in some par- 
ticular processes in specific organisms or conditions, which 
include heat-shock response, polymyxin resistance, virulence, 
and detoxification of polyunsaturated fatty acids (Klein et al. 
2003; Lacour et al. 2006; Morona et al. 2006; Derouiche et al. 
2013). In addition, the extremely low percentage of BY-ki- 
nases in some phyla or genera also indicates the possibility 
that HGT events occurred during evolution. Interestingly, 
there are 15 BY-kinase genes (2 in Firmicutes and 13 in 
Proteobacteria) found in plasmids, whereas the majority 
(98.1 %) of BY-kinase-encoding genes are located within the 



chromosome (fig. 1), which also supports the notion of HGT 
acquisition. 

Among the identified putative BY-kinases, 545 (from 383 
organisms) belong to TAD-CD type, containing both TAD and 
CD in single protein, which was encoded by the same gene; 
and 252 (from 206 organisms) are of the CD type, in which 
TAD and CD belong to two separate proteins and are encoded 
by two genes (fig. 1). Previous reports usually described CD 
type and TAD-CD type as the Firmicutes type and 
Proteobacteria type, respectively (Jadeau et al. 2008). Our sys- 
tematic study confirms that 98.7% of BY-kinases from 
Firmicutes identified in this study are of the CD type and 
96% from Proteobacteria are of the TAD-CD type, and yet 
the distribution of these two types of BY-kinases is not re- 
stricted to these two phyla. In most bacterial phyla, a propor- 
tion bias is found in favor of one particular BY-kinase type, 
for example, there is only 1 CD-type versus 83 TAD-CD-type 
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BY-kinases in Alpha proteobacteria (fig. 1). Interestingly, the 
abundance of CD- and TAD-CD-type BY-kinases is balanced 
in Deltaproteobacteria, with 1 6 BY-kinases of the CD type and 
1 8 of TAD-CD type. For the CD type, in most cases (230 over 
252), the gene encoding the CD was colocalized with that of 
encoding the corresponding TAD (supplementary tables S2 
and S3, Supplementary Material online), consistent with the 
notion that the presence of both are necessary to reconstitute 
a fully functional enzyme. We also found the existence of few 
orphan CD-type BY-kinases in some organisms (supplemen- 
tary tables S2 and S3, Supplementary Material online), where 
there is no corresponding TAD-encoding gene flanking the 
genes for those orphan CD-type kinases. It has been shown 
that TAD was essential for CD domain activity (Mijakovic et al. 
2003), therefore the mechanism and biological function of 
those orphan CD-type kinases will be quite interesting to in- 
vestigate. We speculate that these orphan CD-type kinases 
may be activated by the kinase-modulator proteins (TAD) 
from another CD-type kinase in the organisms harboring 
both, for example, Clostridium acetobutylicum ATCC 824. 

Evolutionary History of BY-Kinases 

For the same reason mentioned previously, we performed the 
phylogenetic analysis of BY-kinases based on the CD. 
According to the large number of BY-kinase sequences and 
the limited number of unambiguously aligned positions (178 
amino acids) available for phylogenetic analysis, the rooted 
tree of 796 BY-kinases is only partially resolved, especially at 
the most basal nodes (supplementary fig. S1, Supplementary 
Material online). The exhaustive phylogenetic tree showed 
that all the BY-kinases formed a monophyletic clade with 
MinD from E. coli and B. subtilis as outgroups, which sug- 
gested that all these BY-kinases have evolved from a 
common origin. Although the order of emergence of the 
main bacterial phyla is not fully resolved, the overall BY- 
kinase tree does not fit the reference bacterial tree, because 
several phyla were split into different clusters. The resulting 
tree is composed of four major clades, as well as several minor 
ones. Interestingly, several phyla were split into different 
clusters (e.g., two clusters for Firmicutes, three clusters for 
Gammaproteobacteria, three clusters for Alphaproteobac- 
teria, and three clusters of Betaproteobacteria). To gain 
more insight in the evolutionary history of BY-kinases, a 
subset of 54 sequences representative of their phylogenetic 
diversity was used to perform an in-depth phylogenetic anal- 
ysis. The overall topology of the resulting Bayesian tree was in 
agreement with the general BY-kinas tree (fig. 2), with equally 
limited resolution in some basal nodes. Two main well-sup- 
ported clades (labeled 1 and 2) and several smaller groups 
were identified (with posterior probability >0.5), correspond- 
ing to major bacterial phyla. Clade 1 is mainly composed of 
BY-kinases from Firmicutes, for which all of the members are 
exclusively of the CD type. Clade 2 mainly contains BY-kinases 



of the TAD-CD type, from Gammaproteobacteria, Betaproteo- 
bacteria, Cyanobacteria, Bacteroidetes, Actinobacteria, and 
Deltaproteobacteria. Moreover, BY-kinases from Cyanobac- 
teria formed a well-supported small clade, BY-kinases from 
Deltaproteobacteria were split into clade 2 and other unclas- 
sified groups, and Actinobacteria BY-kinases fell into two- 
separated clusters, which was in agreement with the previous 
global analysis (supplementary fig. S1, Supplementary Mate- 
rial online). 

As mentioned before, gene expansion and duplication of 
BY-kinases has been observed, and the perfect show case 
for these events is the genus Burkoholderia (from 
Betaproteobacteria). Twenty-five sequenced Burkholderia 
strains harbor 78 BY-kinases of the TAD-CD type. BY-kinases 
are particularly abundant in Burkholderia phytofirmans PsJN (6 
BY-kinases), Burkholderia sp. CCGE1002 (6 BY-kinases), and 
Bu. xenovorans LB400 (8 BY-kinases) (supplementary tables S2 
and S4, Supplementary Material online). To determine the 
origin of multiple BY-kinases in the Burkoholderia common 
ancestor, additional phylogenetic analysis based on CD were 
performed and resulted in six distinct well-supported sub- 
codes (Burk 1-6) (supplementary fig. S2 and table S4, 
Supplementary Material online). Some organisms, such as 
Bu. xenovorans LB400, contain two or more copies of BY- 
kinases within the same subclade (supplementary table S4, 
Supplementary Material online), but most of the organisms 
(19 out of 25) have multiple copies of BY-kinases from differ- 
ent subclades. It suggests that the Burkoholderia common 
ancestor probably already contained an expansion of BY- 
kinases, which was followed by recent duplications in some 
strains. Moreover, the presence of only one copy of BY-kinase 
in most organisms from the Burk6 subclade (23 out of 25) 
suggests that one of the BY-kinases ancestor genes is likely 
to be from this subclade. It is worth mentioning that 
Burkholderia usually have multiple chromosomes, and the 
extra copies of BY-kinase genes are usually located in different 
chromosomes and even in plasmids. Considering the genome 
size variation and high genomic plasticity in the genus 
Burkholderia (Chain et al. 2006), the accumulation of BY- 
kinases may be the result of genome scale expansions. 

To check if the CD region of two types of BY-kinases, CD 
and TAD-CD, have similar evolutionary histories, we per- 
formed an in-depth phylogenetic analysis of the BY-kinases 
from Deltaproteobacteria, which possess a balanced set of 
BY-kinases of both types. The phylogenetic analysis of 
Deltaproteobacteria BY-kinases based on CD domain and 
the full sequences clearly shows that all BY-kinases from this 
phylum fall into two well-supported clades (posterior proba- 
bilities of 1), one corresponding to the CD type and the other 
to the TAD-CD type (supplementary fig. S3, Supplementary 
Material online). Moreover, the speciation between CD-type 
and TAD-CD-type kinases from the same organism (e.g., 
Geobacter sp. M18 and Desulfobacca acetoxidans DSM) 
suggests that CDs of each BY-kinase type have different 
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Fig. 2. — Bayesian phylogenetic tree of 54 BY-kinase sequences representative of the BY-kinase diversity. MinD proteins (AAC74259, CAB 14759) were 
used as outgroups. Leafs are colored according to bacterial main phyla. The colored dots at the nodes indicate the posterior probabilities. Nodes supported 
with a posterior probability >0.9 are indicated by a red dot, whereas nodes supported with a posterior probability >0.5 are indicated by a blue dot. The scale 
bar represents the average number of substitutions per site. The well-supported major clades are marked with 1 and 2. 



evolutionary histories, even in the same organism. In addition, 
because most Geobacter species harbor only one CD-type BY- 
kinase gene (supplementary fig. S3, Supplementary Material 
online), the occurrence of both BY-kinase types in Geobacter 
sp. M18 and Geobacter uraniireducens Rf4 suggests possible 
HGT acquisition of the TAD-CD-type protein in those strains 
during evolution. 

As shown earlier, the less-well supported deeper nodes of 
our phylogenetic trees restricted the tracking of the BY-kinase 



evolutionary history to full extent. Previous analyses have sug- 
gested that for the proteins with a high degree of mutational 
saturation, such as AAA+ superfamily of P-loop NTPase, 
which contains Walker A and Walker B motifs, accumulation 
of mutations at the same positions throughout time generate 
more noise than signal for molecular phylogeny (Gribaldo and 
Philippe 2002; Leclere and Rentzsch 2012). Because BY- 
kinases belong to this superfamily, we examined the muta- 
tional saturation of these enzymes by amino acid substitution 
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Fig. 3. — Amino acid substitution analysis of BY-kinases. The level of 
substitution saturation is evaluated by the ratio between inferred number 
of substitutions (Xaxis) and the observed difference (/axis). Dots in the 
straight line Y=X correspond to the completely unsaturated sequence 
pairs. 



analysis. Amino acid substitution is considered as saturated if 
the number of observed differences keeps constant while in- 
creasing the inferred number of substitutions. As shown in 
figure 3, most of the sequence pairs are located within the 
plateau, therefore inferring that the substitution has already 
reached saturation in a number of BY-kinases. Moreover, the 
nucleotide substitution saturation was also detected among 
these species (data not shown). Although we cannot exclude 
alternative explanations at present, this finding suggests that 
BY-kinases may evolve fast, and the substitution level in BY- 
kinase genes has reached saturation even between some clo- 
sely related species. Faster evolution may explain the weak 
confidence levels in the deeper branches of our phylogenetic 
trees and supports the idea suggested previously that BY-ki- 
nases are open to fast evolution to adopt new substrates 
(Mijakovic and Macek 2012). To further test the hypothesis 
of fast evolution of BY-kinases, the synonymous and nonsyn- 
onymous substitution rates were estimated with PAML. As 
shown in supplementary table S5, Supplementary Material 
online, BY-kinases have a significantly higher nonsynonymous 
substitution rate (d/V=0.1607) compared with 21 conserved 
genes in bacterial lineages (from 0.0358 to 0.1034) chosen as 
reference. Most of proteins are dominated by purifying selec- 
tion (i.e., the removal of functionally deleterious mutations), 
and their nonsynonymous rate will be less than the synony- 
mous rate (dA//d5< 1), independently of the evolution model. 
To investigate the evolutionary forces that shaped BY-kinases, 
the d/V/d5 ratio was estimated. As shown in supplementary 
table S6A, Supplementary Material online, strong purifying 
selection pressure leads to the low 6N/6S value. Because the 



substitution saturation could bias the estimation of the dA//d5, 
we restricted our analysis to three BY-kinases (Str. pneumo- 
niae TIGR4, Str. suis SC84, and Eu. eligens ATCC27750, three 
closely related strains with low level of substitution saturation). 
Low 6N/6S value (co = 0.00283 for model M0) were observed, 
and no positively selected sites were identified in those three 
kinases (supplementary table S6B, Supplementary Material 
online). All these evidences indicated that the purifying selec- 
tion, and not the positive selection, was the major driving 
force in the evolution of the BY-kinase family. 

Classification of BY-Kinases 

Because the phylogenetic approach could not provide an ex- 
haustive classification of all BY-kinase sequences, as an alter- 
native strategy, the PSN, which was shown recapitulate much 
of the information present in phylogenetic trees (Atkinson 
et al. 2009), was used. The emergence of connections be- 
tween putative clusters was examined with different align- 
ment E-value cutoffs from 10~ 20 to 10~ 100 . Permissive 
cutoffs (e.g., 10~ 20 ) collapsed all sequences into one single 
cluster without any outliers (supplementary fig. S4/\, 
Supplementary Material online), whereas more stringent cut- 
offs (e.g., 10 -100 ) broke the data set into small-disconnected 
groups (supplementary fig. S4£, Supplementary Material 
online). Through the distribution of pairwise alignment num- 
bers with decreasing E-value cutoffs (from 10~ 20 to 10~ 100 ) 
(supplementary fig. S4C, Supplementary Material online), an 
optimal E-value cutoff of 10~ 55 was determined, and the re- 
sulting pairwise alignments kept the majority (92%) of BY- 
kinases. As shown in figure 4, the analysis produced two 
major groups and also a number of peripheral clusters. Most 
of the sequences grouped together according to taxonomic 
relationship, which was consistent with previous phylogenetic 
results. We proposed to define them as groups A, B, and C. In 
Group A, three subgroups were identified: subgroups A1 , A2, 
and A3 mainly include BY-kinases from Actinobacteria, 
Cyanobacteria, and Firmicutes, respectively. The Group B 
mainly includes proteins from Gammaproteobacteria (sub- 
group B1) and Betaproteobacteria (subgroup B2). The Group 
C contains nine subgroups (C1-C9) representing seven bac- 
terial phyla and a miscellaneous ungroup cluster (shaded 
box in fig. 4). In summary, numerous phyla including Alpha- 
proteobacteria (C2, C3, and C5), Gammaproteobacteria (B1 
and C4), Actinobacteria (A1 and C8), Firmicutes (A3 and C6), 
and Bacteroidetes (C7 and C9) are mainly represented by at 
least two subgroups. Furthermore, through the overall taxo- 
nomic information of each subgroup (supplementary table S7, 
Supplementary Material online), some of these subgroups are 
mainly related to a particular genus (e.g., C6 from Streptococ- 
cus genus; C8, from Bifidobacterium genus), whereas in some 
of the subgroups, there is a small number of BY-kinases be- 
longing to other phyla. 
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Fig. 4. — The PSN reconstructed from the CD of the BY-kinases. Sequences are represented by nodes, and the nodes from the same taxonomic groups 
are in the same color. The edges are colored with a gray scale, and the darker the color is, the more significant similarity is. A1 to C9 represent the BY-kinases 
subgroups determined by the PSN reconstruction. Miscellaneous BY-kinases are shown in the shaded area. 



Consensus Sites in BY-Kinase Sequences 

To examine the conservation of residues in BY-kinase se- 
quences, the consensus sites based on the multiple alignments 
with "50-10" rule (Carretero-Paulet et al. 2010) were deter- 
mined. As shown in figure 5A, only seven strong conserved 
sites (100% identical at the position) are found in the entire 
data set of CD domains from 796 BY-kinases, and they are all 
situated in the three Walker motifs. These sites include "GK" 
in Walker A motif, "DXDXR" in Walker A' motif, and 
"DXXPX" in Walker B motif, which was not surprising be- 
cause those three motifs are included in the isBYK algorithm 
and also have been experimentally shown to participate di- 
rectly in ATP hydrolysis and ATP-Mg interaction (Soulat et al. 



2007; Lee et al. 2008; Olivares-lllana et al. 2008). This result is 
consistent with amino acid substitution analysis of BY-kinases 
and supports the notion that these enzymes evolve fast and 
only conserve at essential sites to maintain core enzymatic 
activity. In addition, we found that several regions (marked 
with gray bars in fig. 5A) with a relatively high degree of 
conservation. These include regions between the Walker A 
and Walker A r , regions around the Walker B motif, and the 
Y cluster. 

For each BY-kinase subgroup, we performed the same 
analysis as described earlier. As shown in figure SB, more 
strong conserved sites can be found within each subgroup, 
compared with only seven such sites in the total of 796 
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Fig. 5. — Motif and amino acid consensus sites in BY-kinases and related subgroups. (A) Schematic view of amino acid conserved sites in all 796 BY- 
kinases. Black and gray ticks represent the strong and relatively strong conserved sites in BY-kinases, respectively (see Materials and Methods). "GK" in 
Walker A motif, "DXDXR" in Walker A' motif, and "DXXPX" in Walker B motif are indicated by pink, orange, and blue ticks, respectively. The three motifs 
Walker A, A, and B are marked with green underlines, and the corresponding sequence logos are shown. (B) Schematic maps of the conserved sites of BY- 
kinase subgroups (A1 to C9). Strong and relatively strong conserved sites over the protein subgroups are indicated by black and gray ticks. Except for Walker 
A, A, and B motifs, the signature motifs in each subgroup are indicated within a red box. The region marked with green underlines contained the conserved 
Y-R and Y-H pairs, and their weblogs are indicated in (D). (0 Sequence logos of signature motifs (defined in B) are shown. Motifs located in the same region 
of the BY-kinase are displayed in the same box. (D) Sequence logos of motifs are related to the Y-R and Y-H interactions in subgroup B1 (also underlined in 
green lines in B). 
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Fig. 6. — Distribution and functional classification of genes surround- 
ing the BY-kinases. For each of the five neighboring genes (located up- 
stream and downstream of the BY-kinase), the functional COG category 
was determined. For each surrounded position, the bar indicates the fre- 
quency of each functional gene type (represented here by COG category) 
among the overall categories of the same position. Bar in the right part 
represents the COG distribution in all genomes harboring BY-kinases. 
COG categories are shown in different colors (see the COG color 
legend) and are associated with the corresponding capital letters: A, 
RNA processing and modification; B, chromatin structure and dynamics; 
C, energy production and conversion; D, cell cycle control, cell division, and 
chromosome partitioning; E, amino acid transport and metabolism; F, 
nucleotide transport and metabolism; G, carbohydrate transport and me- 
tabolism; H, coenzyme transport and metabolism; I, lipid transport and 
metabolism; J, translation, ribosomal structure, and biogenesis; K, tran- 
scription; L, replication, recombination, and repair; M, cell wall/membrane/ 
envelope biogenesis; N, cell motility; 0, posttranslational modification; 
protein turnover, chaperones; P, inorganic ion transport and metabolism; 
Q, secondary metabolites biosynthesis, transport, and catabolism; R, gen- 
eral function prediction only; S, function unknown; T, signal transduction 
mechanisms; U, intracellular trafficking, secretion, and vesicular transport; 
V, defense mechanisms; W, extracellular structures; Y, nuclear structure; Z, 
cytoskeleton. 

BY-kinases. For example, subgroup B1 contains 1 3 strictly con- 
served sites and subgroup A3 has 10. From these strong con- 
served sites, signature motifs were defined (supplementary 
fig. S5, Supplementary Material online), and the 10 most sig- 
nificant of them are illustrated (red boxes in fig. SB) with the 
corresponding sequence logos in figure SC. Besides strong 
conserved sites, more sites with a relatively high degree of 
conservation (marked with gray bars in fig. SB) were also 
identified within each subgroup. All these sites are seem to 
be of functional importance. The functional importance of 
these sites is highlighted by the example of Y574 of Etk and 
Y569 of Wzc from E. coli, which have been suggested to 
mediate the intraphosphorylation in a two-step activation pro- 
cess (Grangeasse et al. 2002; Lu et al. 2009). It has been re- 
ported that H576 (located 5 amino acids downstream the 
DXDXR motif) and R614 (located between the DXDXR and 
DXXP motifs) of Etk can interact with phosphorylated Y574 



(located three amino acids downstream the DXDXR motif) 
(Lee et al. 2008), and R614 was critical for Etk kinase activity. 
On the basis of this, we looked for correlation in presence of 
these amino acids in each subgroup (supplementary table S8, 
Supplementary Material online) and found that Y574-R614 
and Y574-H576 pairs of amino acids were only conserved in 
subgroup B1 (fig. SD). Within this subgroup, 87% of the se- 
quences harbor the conserved Y, 89% contained the con- 
served R, 89% contained the conserved H, 80% and 78% 
possessed Y-R and Y-H pairs, respectively, and 72% harbor 
both Y-R and Y-H. The high conservation of both Y-R and Y-H 
pairs suggests that it may play an important role in BY-kinases 
intraphosphorylation within the subgroup B1, which mainly 
include organisms from Gammaproteobacteria. 

Conserved Functional Genomic Context of BY-Kinases 

Experimentally characterized BY-kinases are usually encoded 
by genes located in operons involved in synthesis and export 
of capsular or extracellular polysaccharides and have been 
described as polysaccharide copolymerases in previous studies 
(Cuthbertson et al. 2009; Grangeasse et al. 201 2). We sought 
to shed more light on the function of BY-kinases by screening 
the genomic neighborhoods of their genes. As shown in 
figure 6, the genomic context of BY-kinase genes was not 
conserved and varied considerably in different genomes. 
However, by comparing the COG classification between the 
immediate neighbor genes of BY-kinase genes and the entire 
genome, we found the location of BY-kinase genes was not 
entirely random. For the proteins encoded by the three imme- 
diate neighbor genes, 45.4% in average were involved in cell 
wall/membrane/envelop biogenesis (COG category M), 9.4% 
in average were involved in carbohydrate transport and me- 
tabolism (COG category G), and about 20% in average were 
belong to proteins without hits in COG, independently of 
the neighbor position from -3 to +3 except -1. The high 
frequency of neighbor genes encode proteins for cell wall/ 
membrane/envelop biogenesis (COG category M) is not re- 
lated to their proportion within the genomes, because the 
overall distribution of COG category M genes from all ge- 
nomes is lower than 5%. This finding suggests a functional 
enrichment of BY-kinase regions in capsular biosynthesis pro- 
teins, which is consistent with the initial definition of BY- 
kinases as a component of the polysaccharide biosynthesis 
pathway. The same conclusion can be achieved by the analysis 
of genomic-context network associated to BY-kinase genes 
(supplementary fig. S6, Supplementary Material online), for 
example, BY-kinase (COG0489) was strongly linked to COG 
category M (COG3944), which corresponds to capsular bio- 
synthesis proteins. Moreover, at position -1, proteins related 
to signal transduction (COG category T) appear to be among 
the most frequent neighbors of BY-kinase genes (fig. 6), and 
protein tyrosine phosphatases (COG0394) were linked with 
high frequency to BY-kinases (COG3206, another COG which 
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BY-kinases were defined to) (supplementary fig. S6, 
Supplementary Material online). This result was consistent 
with previous report that BY-kinase genes are often colocal- 
ized with phosphatase-encoding genes (Mijakovic et al. 2003) 
and indicates that dephosphorylation performed by these 
phosphatases also plays a major role in the regulatory mech- 
anisms mediated by BY-kinases. However, the genes flanking 
BY-kinases genes are also involved in many other biological 
processes (supplementary fig. S6, Supplementary Material 
online), which was not surprising because it was known that 
BY-kinases Wzc from E. coli and PtkA from B. subtilis partic- 
ipate in many processes besides capsular biosynthesis (Klein 
et al. 2003; Lacour et al. 2006, 2008). The variety of the im- 
mediate genomic environment supports the notion that BY- 
kinases may play a complex role in bacterial physiology and 
may contribute to several distinct signaling pathways in the 
same bacterium. 

Evolutionary Relationship between BY-Kinases and Their 
Substrates 

Ugds are the first discovered and the best characterized sub- 
strates of BY-kinases (Mijakovic et al. 2003; Lacour et al. 2008; 
Egger et al. 2010). It has been shown that cognate Ugds are 
phosphorylated in vitro by BY-kinases PtkA from B. subtilis and 
Wzc from E. coli, which significantly increased the Ugd dehy- 
drogenase activity (Grangeasse et al. 2003; Mijakovic et al. 
2003). The activation of Ugd depends on phosphorylation of 
a specific tyrosine residue which in its nonphosphorylated 
form hinders the binding of substrates in the active site 
(Petranovic et al. 2009). We sought to elucidate the evolution- 
ary past of the relationship between BY-kinases and their sub- 
strates Ugds. As shown in supplementary table S9, 
Supplementary Material online, 2,346 homologs of Ugd 
family proteins are present in 1,122 bacterial genomes and 
93 archaeal genomes, which means that they are more wide- 
spread than BY-kinases. Furthermore, only 508 organisms 
possess both BY-kinases and Ugd family proteins, whereas 
614 organisms, distributed in almost all bacterial phyla, only 
contain Ugd family proteins without BY-kinases (fig. 7A). The 
different phylogenetic profiles suggest that Ugd proteins 
are not universally regulated by BY-kinases. Organisms with 
both Ugds and BY-kinases are mainly members of Alphapro- 
teobacteria, Betaproteobacteria, Gammaproteobacteria, and 
Firmicutes (supplementary tables S2 and S9, Supplementary 
Material online), the four phyla with high BY-kinase abun- 
dance. There are also 69 organisms without an Ugd, but 
with BY-kinases, they are mainly members of Firmicutes 
(e.g., organisms from Lactobacillus and Streptococcus 
genera) and are found in both A3 and C6 subgroups 
(fig. 1A and supplementary table S2, Supplementary Material 
online). Because previous studies have shown that some ki- 
nases coevolved with their substrates, that is, eukaryal cyclin 
Pcl5 and its substrate Gcn4 (Gildor et al. 2005), or the bacterial 



histidine kinases and response regulator pairs (Skerker et al. 
2008), we performed phylogenetic analysis of Ugds from or- 
ganisms harboring both Ugds and BY-kinases and observed 
that the topology of the Ugd phylogenetic tree is very different 
from the tree of BY-kinases (fig. IB). The incongruence be- 
tween those trees suggested a complex evolutionary relation- 
ship between BY-kinases and Ugds. Because BY-kinase and 
Ugd encoding genes are located in the same gene cluster in 
some organisms, we narrowed down our focus on 105 of 
such BY-kinase and Ugd co-occurrence pairs. Phylogenetic 
analysis of these pairs resulted once again in very different 
trees for BY-kinases and Ugd proteins (supplementary fig. 
S7, Supplementary Material online). Thus, we could detect 
no evidence of coevolution between BY-kinases and Ugds at 
the sequence level, despite the fact that their genes are lo- 
cated in the same gene clusters (operons) and probably highly 
correlated at the expression level. 

The tyrosine 70 of Ugd is phosphorylated in the B. subtilis, 
and homology-based structure modeling reveals that this 
phosphorylated tyrosine residue locate in the positioned at 
the N-terminal extremity of helix oc4 and is in close proximity 
to the N-terminal NAD-binding site (Petranovic et al. 2009; 
Egger et al. 2010). To examine whether the Ugd tyrosine 
phosphorylation site was conserved, the Ugd proteins were 
aligned. As shown in figure 7C, the tyrosine phosphorylation 
site of Ugd, which had been characterized in E. coli and B. 
subtilis, is present in Klebsiella pneumoniae, Bifidobacterium 
animalis, Anaeromyxobacter sp., and Mycobacterium sp.; 
however, the position of the phosphorylated tyrosine is not 
conserved and varies in helix oc4 region. One possible expla- 
nation is that the tyrosine phosphorylation may occur within a 
particular region but not at a particular position in a protein 
and therefore may shift position during evolution (Moses et al. 
2007; Nguyen Ba and Moses 2010). Another explanation is 
that proteins with multiple sites may lose or gain some sites 
without affecting the regulation of the protein (Moses et al. 
2007; Nguyen Ba and Moses 2010). Moreover, the Ugd phos- 
phorylation site seems to be lost in some organisms (e.g., 
Geobacillus sp. and Nostoc sp. 7120 in fig. 7Q, despite the 
presence of a BY-kinase in the genome. In such cases, it is 
probable that the Ugd activity is no longer under control of 
BY-kinase. 

The BY-Kinase Substrate Specificity across Species 
Boundaries 

To investigate the relationship between BY-kinases and sub- 
strates, we decided to probe experimentally the ability of a 
number of evolutionary distant BY-kinases to phosphorylate 
some substrates (including Ugd) of the canonical BY-kinase 
PtkA from B. subtilis. In the set of kinases, we included 
three well characterized ones: PtkA from B. subtilis 
(Mijakovic et al. 2003), CapB from Sta. aureus (Olivares- 
lllana et al. 2008), CpsD from 5. pneumoniae (Morona et al. 
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Fig. 7. — Relationship of BY-kinases and their substrates Ugds. {A) Distribution of BY-kinases and Ugd family proteins in bacteria. The white square with 
the black frame indicates all analyzed genomes; organisms containing BY-kinases are in pink; and organisms harboring proteins from Ugd family are in blue. 
(B) Bayesian tree of a sample of 72 Ugd sequences representative of the Ugd family diversity. Leafs are colored according to bacterial phyla. The colored dots 
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2004), and four putative BY-kinases from K. pneumoniae, 
Acinetobacter baumannii, Bi. animalis, and Leuconostoc 
mesenteroides identified by the BY-kinase database. Among 
these kinases, PtkA, CapB, and Byk from L. mesenteroides 
belong to group A (subgroup A3); Byk from K. pneumoniae 
and Byk from A. baumannii belong to group B (subgroup B1); 
and CpsD and Byk from Bi. animalis belong to group C (sub- 
groups C6 and C8, respectively). Additionally, PtkA, CapB, 
CpsD, and Byk from L mesenteroides are CD type, whereas 
the other three are TAD-CD type (fig. SA). For the CD-type 
kinases, PtkA was carried out with experiments in the pres- 
ence of its modulator TkmA (TAD) (Mijakovic et al. 2003), Byk 
from L. mesenteroides was also assayed in the presence of 
B. subtilis TkmA with the attempt to supplement its function, 
whereas CapB and CpsD were expressed as translational fu- 
sions with the activator region of their corresponding modu- 
lators CapA and CpsC (TAD) and purified as CapAB and 
CpsCD (Olivares-lllana et al. 2008). For the other three TAD- 
CD-type kinases, we cloned the intacellular CDs from the end 
of the second transmembrane helix to the stop codon to 
maintain the activator region of TAD. First, to get a global 
structural overview of the putative kinases, homology-based 
modeling was used to generate their structural models. All 
models were superimposed with the resolved structure of 
CapAB (Olivares-lllana et al. 2008) (fig. 8B). The extensive 
overlap between models (Core RMSD : 1.371 A) suggests 
that these uncharacterized proteins also exhibit the conserved 
BY-kinase fold, in spite of the high sequence divergence. 
Autophosphorylation activity of all proteins was examined 
by autoradiography. The experiment was carried out first 
without TkmA, and we found autophosphorylation activity 
for three TAD-CD-type putative BY-kinases (fig. 80, as well 
as for three previously characterized BY-kinases CpsCD, 
CapAB, and PtkA. However, there was no detectable autop- 
hosphorylation activity for the CD-type putative Byk from 
L. mesenteroides. PtkA exhibited very weak autophosphoryla- 
tion due to the absence of its modulator TkmA (Mijakovic 
et al. 2003). Thus, the same experiment was repeated in the 
presence of B. subtilis TkmA. Although TkmA increased 
autophosphorylation of PtkA, Byk from L. mesenteroides still 
did not show any autophosphorylation. This finding suggested 
that Byk from L. mesenteroides probably requires its cognate 
modulator (encoded by the same operon as the kinase) 
(fig. SA) or may not be a functional kinase at all. We also 
found that TkmA did not change the autophosphorylation 
of any of the other kinases (fig. 80- Our data set is too 



small for drawing definite conclusions, but these findings do 
suggest that the kinase-modulator relationship is probably 
species specific. 

Next, we examined whether all these kinases can 
phosphorylate Ugd from B. subtilis. Previous work showed 
that Ugd from B. subtilis can be phosphorylated by BY-kinases 
from E. coli (Mijakovic et al. 2003). Our results here sug- 
gest that this was not an isolated phenomenon. Six out of 
seven kinases examined here phosphorylated B. subtilis Ugd 
(fig. 8D). Byk from L. mesenteroides was the only one that 
has shown no activity (fig. 8D), which was not surprising be- 
cause the kinase was already inactive for autophosphorylation 
(fig. 80- This may explain why BY-kinase and Ugd do not 
need coevolution — they retain the capacity to recognize 
each other even across very distant evolutionary boundaries. 
To turn to some more idiosyncratic and far less widely distrib- 
uted substrates, we also performed the same analysis with 
two newly identified substrates of B. subtilis PtkA: YvyG and 
YjoA (Jers et al. 2010), which are only present in Firmicutes. 
The result was identical to the one obtained for Ugd, six out of 
seven kinases (all except the Byk from L. mesenteroides) phos- 
phorylated YvyG and YjoA (fig. 8E). In conclusion, we propose 
that BY-kinases maintain the capability to phosphorylate the 
same set of substrates across very distant phyla, probably by 
keeping the ability to recognize global structural motifs and 
not specific sequences around the target sites. 

Regulation by Antimonite as a Shared Mechanism 
between ArsA Proteins and Some BY-Kinases 

Previous studies have revealed that BY-kinases exhibit signifi- 
cant sequence similarity with ArsA proteins and suggested 
that they may have evolved from the same ancestor 
(Grangeasse et al. 2012). ArsA ATPase is the cytosolic subunit 
of the Ars pump protein in E. coli, which provides resistance to 
arsenic or antimony. ArsA contains two nucleotide-binding 
sites and an allosteric-binding site for antimonite Sb(lll). 
Binding of Sb(lll) increases the ArsA ATPase activity (Li et al. 
1996; Zhou et al. 2000; Walmsley et al. 2001). Presumably, 
the ATP-binding domain in ArsA and the CD of BY-kinases 
come from the same origin, so we decided to examine 
whether BY-kinase active site could be affected by antimonite. 
Interestingly, the activity of Byk from Bi. animalis and CpsCD 
from 5. pneumoniae increased slightly with the addition of 
Sb(lll) to the reaction (fig. 9A). The phosphorylation of Ugd 
by Byk from Bi. animalis reached the maximum at 5 mM Sb(lll), 
whereas for CpsCD, the optimum concentration of Sb(lll) was 



Fig. 7. — Continued 

a posterior probability >0.8 are indicated by a blue dot. The scale bar represents the number of substitutions per site. (O Multiple sequence alignment of Ugd 
family proteins. Proteins from Escherichia coli, Klebsiella pneumoniae, Bifidobacterium animalis, Mycobacterium sp. Anaeromyxobacter sp, Bacillus subtilis, 
Geobacillussp., Bradyrhizobium sp., and Nostoc sp. were aligned. Only the N-terminal extremity of helix oc4 including the possible phosphorylation site (Egger 
et al. 201 0) were shown. Conserved amino acids are highlighted. The CD-type BY-kinases were indicated by asterisk (*). The positions of phosphorylation site 
(Y) are marked with gray and indicated with an arrow. 
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Fig. 8. — In vitro phosphorylation of the BY-kinases. Asterisk (*) indi- 
cates BY-kinase of CD type. (A) Schematic diagram of gene organization of 
BY-kinases and Ugd in Staphylococcus aureus, Streptococcus pneumoniae, 
Klebsiella pneumoniae, Acinetobacter baumannii, Bifidobacterium ani- 
malis, Leuconostoc mesenteroides, and Bacillus subtiiis. The number 
above the double slash indicates the number of interval genes. (B) The 



2 mM, and further increase in antimonite reduced the kinase 
activity. Other BY-kinases were not affected by Sb(lll). It is 
noteworthy that the structures of ArsA and BY-kinase ATP- 
binding domains overlap extensively (fig. 95). However, 
beyond that domain, there is a low sequence identity (about 
20%) between the two enzymes. In particular, the Sb(lll)-bind- 
ing site of ArsA is outside this region, and no homolog can be 
find in BY-kinases. Thus, the molecular mechanism of activa- 
tion of BY-kinases by antimonite remains puzzling, even 
though it supports the notion of an evolutionary link between 
BY-kinases and ArsA ATPases. 

Discussion 

In this study, we performed a large scale in silico analysis of 
BY-kinases and its well-known substrates, the Ugd family 
proteins. Phylogenomic analysis of these proteins reveals 
that BY-kinases ubiquitously distributed in most bacterial 
phyla, except Aquificales, Chlamydia, and 
Epsilonproteobacteria. In spite of the partially resolved 
nodes at the basal position of the phylogenetic trees, it still 
can be asserted that the evolutionary history of BY-kinases 
has been affected considerably by HGT and duplications. 
With an alternative clustering method, the BY-kinases were 
classified into three main groups and 14 subgroups, and we 
provide in silico evidence that each subgroup contains 
specific signatures. Our results indicated that BY-kinases 
underwent fast evolution and maintain a high conservation 
only for the residues directly involved in structure formation 
and catalytic activity. Analysis of the relationship between 
BY-kinases and their substrates showed that there is no 



Fig. 8. — Continued 

superposition of all BY-kinase structural models (5. pneumoniae, K. pneu- 
moniae, A. baumannii, B. animalis, L. mesenteroides, and B. subtiiis) with 
the structure from 5. aureus, colored by secondary structure (truncated 
from N-ter and C-ter, Core Residues: 164, Core RMSD: 1.371 A). (0 
In vitro autophosphorylation of BY-kinases; 1-jiM BY-kinases were incu- 
bated without or with 1 fiM TkmA. Bands corresponding to autopho- 
sphorylated BY-kinases are indicated. Molecular weight of PtkA is 
shown on the left. (D) Phosphorylation of Ugd by seven BY-kinases; 10- 
fiM Ugd was incubated individually with following kinases: 2 fiM CapAB 
from 5. aureus, 2fiM CpsCD from 5. pneumoniae, 2jiM Byk from 
B. animalis, 2 fiM Byk from L. mesenteroides, 1 \M of Byk from K. pneu- 
moniae, 1 .5 jiM of Byk from A. baumannii, or 0.8 jiM of PtkA and TkmA 
from B. subtiiis. Bands corresponding to autophosphorylated BY-kinases 
and phosphorylated Ugd are indicated. Molecular weights of Ugd and 
PtkA are shown on the left. (£) Phosphorylation of YvyG and YjoA by 
the seven BY-kinases; 10fiM of YvyG or YjoA were incubated with the 
following kinases: 2 fiM Byk from A. baumannii, 2 jiM Byk from L. mesen- 
teroides, 1 jiM of CapAB from 5. aureus, 1 \M CpsCD from 5. pneumo- 
niae, 1 jiM Byk from K. pneumoniae, 1 jiM Byk from B. animalis, 1 jiM 
PtkA, and TkmA from B. subtiiis. Bands corresponding to autophosphory- 
lated BY-kinases and phosphorylated YvyG and YjoA are indicated. 
Molecular weights of YvyG, YjoA, and PtkA are shown on the left. 
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Fig. 9. — Phosphorylation of Ugd by BY-kinases in presence of Sbflll). (A) Phosphorylation of Ugd by the seven BY-kinases in presence of different 
concentrations of Sbflll); 1 0 jiM Ugd was incubated individually with the following kinases: 2 jiM of CapAB from Staphylococcus aureus, 2 jiM CpsCD from 
Streptococcus pneumoniae, 2 jiM Byk from Bifidobacterium animalis, 2 jiM Byk from Leuconostoc mesenteroides; 1 fiM of Byk from Klebsiella pneumoniae; 
1 .5 jiM of Byk from Acinetobacter baumannii; and 0.8 jiM of PtkA and TkmA from Bacillus subtilis. Concentration of Sbflll) is indicated above each lane. 
Bands corresponding to autophosphorylated BY-kinases and phosphorylated Ugd are indicated by arrows. Molecular weights of Ugd and BY-kinase are 
shown on the left. (B) Comparison of monomeric ArsA (domains 1 and 2) and monomeric Wzc. Domains 1 and 2 are colored with blue and cyan. Wzc is 
colored with red. Core residues: 139; Core RMSD: 3.311. 



coevolution between the two, and BY-kinases retain the abil- 
ity to phosphorylate substrates from very distant bacterial 
relatives. In Eukarya, prereplicative complex (RC) is regulated 
by the cyclin-dependent kinase (CDK) through phosphoryla- 
tion on CDK consensus sites. Interestingly, those CDK con- 
sensus sites are not conserved in position or number. By 
consequence, the regulation does not require the precise 
phosphorylation sites (Moses et al. 2007). We observed the 
similar "turnover" event with respect to phosphorylation of 



Ugd (Chen et al. 201 1), suggesting that this phenomenon is 
not restricted to Eukarya. 

Bacteria are widely distributed in most habitats on earth, 
even in some extreme conditions. They can adapt to frequent 
and rapid changed environment through adjusting their phys- 
iology and behavior, chemotaxis, phototaxis, dormancy, bio- 
film formation, etc. Signal sensing and transduction are the 
first steps of adaptative processes, and it has been shown that 
protein phosphorylation play a major role in signaling 
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pathways (Cohen 2000). Previous studies suggested that BY- 
kinases were expressed as stress responds and modulate 
cellular processes to survive (Vincent et al. 2000). Moreover, 
BY-kinases were involved in different processes via phosphor- 
ylation different substrates (Klein et al. 2003; Mijakovic et al. 
2003). It therefore leads to a question how those substrates 
emerged and keep to be modulated during evolution. Our 
study suggests that BY-kinases evolve very rapidly and may 
develop a new substrate by recognizing its overall structure 
and not a particular short sequence or phosphorylation site. 
Phosphorylation would then occur within a particular region 
of the substrate and be selected for if it conveys any regulatory 
potential. 

Supplementary Material 

Supplementary figures S1-S7 and tables S1-S9 are available 
at Genome Biology and Evolution online (http:/A/vww.gbe. 
oxfordjournals.org/). 
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